Recovering evolutionary trees under a more realistic model of sequence evolution.

نویسندگان

  • P J Lockhart
  • M A Steel
  • M D Hendy
  • D Penny
چکیده

We report a new transformation, the LogDet, that is consistent for sequences with differing nucleotide composition and that have arisen under simple but asymmetric stochastic models of evolution. This transformation is required because existing methods tend to group sequences on the basis of their nucleotide composition, irrespective of their evolutionary history. This effect of differing nucleotide frequencies is illustrated by using a tree-selection criterion on a simple distance measure defined solely on the basis of base composition, independent of the actual sequences. The new LogDet transformation uses determinants of the observed divergence matrices and works because multiplication of determinants (real numbers) is commutative, whereas multiplication of matrices is not,except in special symmetric cases. The use of determinants thus allows more general models of evolution with a symmetric rates of nucleotide change. The transformation is illustrated on a theoretical data set (where existing methods select the wrong tree) and with three biological data sets: chloroplasts, birds/mammals (nuclear), and honeybees ( mitochondrial ) . The LogDet transformation reinforces the logical distinction between transformations on the data and tree-selection criteria. The overall conclusions from this study are that irregular A,C,G,T compositions are an important and possible general cause of patterns that can mislead tree-reconstruction methods, even when high bootstrap values are obtained. Consequently, many published studies may need to be reexamined.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction

The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges i...

متن کامل

When is it safe to use an oversimplified substitution model in tree-making?

The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p-distance) between sequences as an estimator of evolutionary dista...

متن کامل

Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation.

Using real sequence data, we evaluate the adequacy of assumptions made in evolutionary models of nucleotide substitution and the effects that these assumptions have on estimation of evolutionary trees. Two aspects of the assumptions are evaluated. The first concerns the pattern of nucleotide substitution, including equilibrium base frequencies and the transition/transversion-rate ratio. The sec...

متن کامل

Recovering the Tree-Like Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis

Lateral gene transfer (LGT) is a common mechanism of nonvertical evolution, during which genetic material is transferred between two more or less distantly related organisms. It is particularly common in bacteria where it contributes to adaptive evolution with important medical implications. In evolutionary studies, LGT has been shown to create widespread discordance between gene trees as genom...

متن کامل

Evolutionary Distance: Estimation

Estimation of evolutionary distances between protein and deoxyribonucleic acid (DNA) sequences is important for constructing phylogenetic trees, dating species’s divergences and understanding the mechanisms of evolution of genes, proteins and populations. Many estimators of evolutionary distances have been developed in the past 30 years. They are based on different models of sequence evolution ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Molecular biology and evolution

دوره 11 4  شماره 

صفحات  -

تاریخ انتشار 1994